Snowflake as a target

Snowflake is a powerful data warehouse that can be used as a Target for data integration workflows. With Data Integration, you can easily extract data from a variety of Sources and load it into Snowflake for analysis and reporting.

Prerequisites

Ensure you have a valid Snowflake account. If you do not have a valid Snowflake account, create one. If you or your company already has a Snowflake account, you can connect/create a Data Integration user.

Setting up your Snowflake environment

Data Integration Snowflake environment can be configured in two ways, but ensure to use only one method for granting access:

The first method grants an existing user the SYSADMIN role, which enables data integration access to all objects in the database and can extract from, and ingest to, new and existing tables.
The second method either creates or uses existing user, role, database, and warehouse, depending on the settings, and gives Data Integration the ability to ingest data to and/or extract data from (depending on the settings) objects.

First method - Global permission with sysadmin role

Snowflake lets a single user have multiple roles. You must execute various commands on Snowflake and assign a SYSADMIN role to the Data Integration user.

Procedure

Log in to your Snowflake* console using a user with ACCOUNTADMIN role access.
Select the Worksheet tab.
Run the following commands:

   begin;
   
    /* Set variables for script, select an existing user or create a new one, replace the value after var_user with 
    the name of the user, do not forget to replace {password} with the password of your choice 
    if you are creating a new user. */
   
      set var_user = 'RIVERY_USER';
      set var_password = '{password}';
      
    /* switch to ACCOUNTADMIN role: only an ACCOUNTADMIN can set or unset a user’s role */
       USE ROLE accountadmin;
       
       /* Create a user for {conKeyRefs.DataIntegration} or use an existing one */
      create user if not exists identifier($var_user)
      password = $var_password;
      
       GRANT ROLE SYSADMIN TO USER identifier($var_user);
       ALTER USER identifier($var_user) SET DEFAULT_ROLE = SYSADMIN;
   commit; 

Second method - Specific permission with custom role

Copy the following script to your Snowflake console, make sure to change the variables to your desired variables, and only run the necessary steps. Pay attention to the instructions in the script.

begin;
 
   /* Set variables for script, You can choose to work with the default suggested values in the script or use your own. 
 Do not forget to replace {password} with the password of your choice if you are creating a new user. */
   set var_user = 'RIVERY_USER';
   set var_password = '{password}';
   set var_role = 'RIVERY_ROLE';
   set var_database = 'RIVERY_DATABASE';
   set var_warehouse = 'RIVERY_WAREHOUSE';

   
   /* Switch to securityadmin role:  
Role that can manage any object grant globally, as well as create, monitor, and manage users and roles */
   use role securityadmin;
 
   /* Create role for {conKeyRefs.DataIntegration} */
   create role if not exists identifier($var_role);
   grant role identifier($var_role) to role SYSADMIN;
 
   /* Create a user for {conKeyRefs.DataIntegration} */
   create user if not exists identifier($var_user)
   password = $var_password
 default_role = $var_role
 default_warehouse = $var_warehouse;
 
   grant role identifier($var_role) to user identifier($var_user);
 
   /* switch to sysadmin role: 
Role that has privileges to create warehouses and databases (and other objects) in an account. */
   use role sysadmin;
 
   /* Create a warehouse for {conKeyRefs.DataIntegration}, this step is optional */
   create warehouse if not exists identifier($var_warehouse)
 warehouse_size = xsmall
 warehouse_type = standard
 auto_suspend = 300
 auto_resume = true
 initially_suspended = true;
 
   /* Create database for {conKeyRefs.DataIntegration}, this step is optional */
   create database if not exists identifier($var_database);
   
 
   /* Grant {conKeyRefs.DataIntegration} role access to warehouse */
   grant USAGE
   on warehouse identifier($var_warehouse)
   to role identifier($var_role);
 
   /* grant {conKeyRefs.DataIntegration} access to database */
   grant CREATE SCHEMA, MONITOR, USAGE
   on database identifier($var_database)
   to role identifier($var_role);
   
   /* ATTENTION! If you are looking to ingest and extract data from objects created within {conKeyRefs.DataIntegration} 
 and not objects that already exist, you can stop the flow here */
   
   /* Grant access to all existing schemas on the database */
   grant ALL on all SCHEMAS IN DATABASE identifier($var_database) to ROLE identifier($var_role);

   /* Grant access to all  existing tables on the database, might take several minutes if there are many tables */
  grant ALL on ALL TABLES IN DATABASE identifier($var_database) to ROLE identifier($var_role);
   
 
 commit; 
 

note

If you want to add Masking Policy permissions for this user, refer to the Enforce Masking Policy section.

Snowflake security impact on data unloading

Specific Snowflake security settings may cause issues when unloading data to inline external locations, such as a Custom File Zone. While these security parameters are set to false by default, enabling them could prevent successful data unloading to inline external locations.

The affected parameters are:

ALTER ACCOUNT SET REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_CREATION = true;
ALTER ACCOUNT SET REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_OPERATION = true;
ALTER ACCOUNT SET PREVENT_UNLOAD_TO_INLINE_URL = true;

Issue description

These security settings enforce stricter controls on data unloading operations:

REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_CREATION: Requires storage integration to be in place for creating stages.
REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_OPERATION: Requires storage integration for operations involving stages.
PREVENT_UNLOAD_TO_INLINE_URL: Prevents unloading data to inline URLs, including file zones.

As a result, these settings may disrupt workflows that depend on unloading data to inline external locations.

To resolve this issue and enable data unloading to the specified Custom File Zone, refer to the Snowflake documentation for details on authorizing access under each parameter heading.

Creating network policy for Data Integration IPs (Optional)

In some cases, your Snowflake account may be access-restricted by IPs or domains. In such cases, you must add Data Integration IPs to your Snowflake Network Policy to connect successfully.

note

Open Data Integration IPs in Snowflake Network Policies may block any other unspecified IPs in the network policy. Make sure to allowlist all your IPs in Snowflake’s network policies before creating Data Integration. For more information, refer to the Network policies topic.

Create a Network Policy for Data Integration IPs

Procedure

Log in to your Snowflake account.
Make sure the user is set to an ACCOUNTADMIN or SYSADMIN role.
In the worksheet, run the following command:

   CREATE OR REPLACE NETWORK POLICY Data_Integration IPs ALLOWED_IP_LIST = (Copy our most recent allowlisted IPs here)

info

Data Integration Allowlist IPs

Establishing a connection

You can connect to Snowflake using one of three methods:

Basic authentication

To connect to your Snowflake account using basic authentication:

Go to the Connections menu in Data Integration.
In the Snowflake connection form, enter your Connection Name.
Select the Basic Authentication Type.
Enter your Username and Password.

note

Snowflake passwords are restricted from including special characters such as {, }, ", ?, /, #,*, and others.

Enter the name of your Warehouse (mandatory).
Enter the name of your Role (optional).

note

If you leave it blank, the account's default one is used, and in that case, the default Role must have access to the selected Warehouse, or the connection fails.

Input your Account Name (mandatory).

note

To obtain your Account Name, click on your account details at the bottom left of the Snowflake UI. Hovering over your account displays the option to copy your Account URL, which follows this format: https://account-name.eu-central-1.snowflakecomputing.com. In this particular case, the Account Name would be account-name.eu-central-1. If you use Snowflake on other cloud platforms, include the cloud provider in the account name by appending it with a dot. For example: account-name.eu-central-1.gcp.

The connection form includes a Default Database/Schema section. This lets you specify the database and schema you want to work with, and the connection automatically remembers and uses these as the default (Optional).
Set your Custom FileZone to save the data in your own staging area (Optional).
Click Test Connection to verify your connection is up to the task. If the connection succeeds, you can use this connection in Data Integration. You can now pipe data to Snowflake by leveraging this connection in any River.

OAuth2 authentication

Use OAuth2 authentication to write data to Snowflake. This setup requires a user with the SYSADMIN role for table and schema creation.

This method supports write access only with SYSADMIN privileges.

Prerequisites

Create a custom OAuth integration in Snowflake with the following:
- OAUTH_CLIENT = CUSTOM
- Region-specific OAUTH_REDIRECT_URI
- Refresh token issuance enabled
- ALLOWED_ROLES includes SYSADMIN
Grant usage of the integration to the writing user:

  GRANT USAGE ON INTEGRATION my_oauth_integration TO USER sysadmin_user;

Example: OAuth Integration for Write Access

CREATE SECURITY INTEGRATION my_oauth_target
  TYPE = OAUTH
  ENABLED = TRUE
 OAUTH_CLIENT = CUSTOM
 OAUTH_CLIENT_TYPE = 'CONFIDENTIAL'
 OAUTH_REDIRECT_URI = 'https://console.rivery.io/api/oauthcallback/snowflake'
 OAUTH_ISSUE_REFRESH_TOKENS = TRUE
 OAUTH_ALLOW_NON_TLS_REDIRECT_URI = FALSE
 ALLOWED_ROLES = ('SYSADMIN')
 OAUTH_REFRESH_TOKEN_VALIDITY = '90 days';

Redirect URIs by region

Region	Redirect URI
US	`https://console.rivery.io/api/oauthcallback/snowflake`
EU	`https://eu-west-1.console.rivery.io/api/oauthcallback/snowflake`
IL	`https://il-central-1.console.rivery.io/api/oauthcallback/snowflake`
AU	`https://ap-southeast-2.console.rivery.io/api/oauthcallback/snowflake`

Required fields

Field	Description
Client ID	From the OAuth2 app
Client Secret	From the OAuth2 app
Account Name	For example, `xy12345.east-us-2.azure`
Role	Must be `SYSADMIN`
Username	Snowflake user with `SYSADMIN` privileges

Procedure

Enter all required details.
Click Connect to Snowflake.
Authenticate via the Snowflake login page.

Upon successful connection, the integration securely stores:

Access token – used for query execution
Refresh token – used for maintaining session continuity

note

Client ID/Secret: Must exactly match your registered Snowflake OAuth app.
Account Name: Use the format <org>-<account>.<region>.<cloud> (no protocol or domain).
Role: Must be SYSADMIN and included in ALLOWED_ROLES.
Username: Must have usage and OAuth access.
Redirect URI: Must exactly match the URI defined in your Snowflake integration.

Key-pair authentication

To connect to your Snowflake account using Key-Pair Authentication:

Open a Terminal to generate the private key. (Mac / Linux)

note

Windows is not supported.

Enter the following command into your terminal window to download the key file to your documents folder:

cd Documents

note

To confirm the command, click the Enter key.

Run the following command to generate a decrypted version of the key:

openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt

Create the Public key by referencing the Private key.

note

The following command assumes the private key is encrypted and stored in the file rsa key.p8.

Copy and run it:

openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

This lets you generate a private key in PEM (Privacy Enhanced Mail) format:

cat rsa_key.pub

To assign the public key to a Snowflake user, run the ALTER USER command in your Snowflake console:

note

Replace < Username > with your snowflake Username.

alter user < Username > set rsa_public_key='';

Copy the PEM format from step 5 and paste it between the apostrophes in the Snowflake console code.

Go to the Connections tab in Data Integration.
Fill out the Snowflake Connection form with the following information: a. Type in Connection Name. b. Select the Key-Pair Authentication Type. c. Enter your Username. d. Enter the name of your Warehouse (mandatory). e. Enter the name of your Role (optional).

note

If you leave it blank, the account's default one is used, and in that case, the default Role must have access to the selected Warehouse, or the connection fails.

f. Input your Account Name (mandatory).

note

To obtain your Account Name, click on your account details at the bottom left of the Snowflake UI. Hovering over your account displays the option to copy your Account URL, which follows this format: https://account-name.eu-central-1.snowflakecomputing.com. In this particular case, the Account Name would be account-name.eu-central-1. If you use Snowflake on other cloud platforms, include the cloud provider in the account name by appending it with a dot. For example: account-name.eu-central-1.gcp. :::

Drag the rsa_key.p8 file from your Documents folder to the label, or browse for it.
The connection form includes a Default Database/Schema section. This lets you specify the database and schema you want to work with, and the connection automatically remembers and uses these as the default (Optional).
Set your Custom FileZone to save the data in your own staging area (Optional).
Click the Test Connection function to verify your connection is up to the task. If the connection succeeds, you can now use this connection in Data Integration. Refer to the Snowflake documentation to generate an encrypted Key or for more information on the Key-Pair configuration process.

Replace invalid UTF-8 characters option

Replace Invalid UTF-8 Characters(IGNORE_UTF8_ERROR) dictates how invalid UTF-8 characters are handled during data loading. It determines whether these characters should be replaced with the Unicode replacement character or if the loading process should stop and generate an error upon encountering invalid UTF-8 encoding.

TRUE: If Replace Invalid UTF-8 Characters is set to TRUE, Snowflake performs a one-to-one character replacement, replacing each invalid UTF-8 character with the Unicode replacement character.
FALSE: If Replace Invalid UTF-8 Characters is set to FALSE, Snowflake does not attempt to replace invalid characters but instead halts the data loading operation and reports an error upon encountering any invalid UTF-8 character encoding.
Default Value: The default value for Replace Invalid UTF-8 Characters is FALSE. By default, Snowflake does not replace invalid UTF-8 characters but rather raises an error when encountering them during data loading.

Use cases

TRUE: Setting Replace Invalid UTF-8 Characters to TRUE is helpful in scenarios where data integrity is crucial, and preferred to replace invalid characters with a recognizable placeholder rather than halting the loading process.
FALSE: Setting Replace Invalid UTF-8 Characters to FALSE is beneficial when strict data validation is required, and any invalid UTF-8 characters should result in an immediate error, preventing the loading of potentially corrupted data.

Enforce masking policy

A column-level security feature called Masking Policy employs masking strategies to selectively hide plain-text data in the Target table columns.

Masking policies in Snowflake are schema-level objects; hence, a database and schema must exist in Snowflake before a masking policy can be applied to a column.

When this feature is selected, Data Integration makes sure to confirm your masking policy and apply it to each location the column appears during the River run.

Prerequisites

To use the Enforce Masking Policy feature, you must have your account set up in Snowflake with the necessary permissions and have this account listed under Snowflake Enterprise Edition.

If you set up your Snowflake environment using the First method, you are ready to go, and your account has the required permissions.
If you configured Snowflake using the Second method, add the following query to the second method's query and run it in the Snowflake console after specifying your Role, Database, and Schema in the relevant variables:

begin;
  /* ATTENTION! Only Enterprise account holders that employ masking policies should use the script below. */

  /* Set variables for script */
  set var_role = 'RIVERY_ROLE';
  set var_database = 'RIVERY_DATABASE';
  set var_schema = 'RIVERY_SCHEMA';
  /*  Switch to Securityadmin role:
Role that can create and control Users and other Roles as well as manage any object. */
  use role securityadmin;
  /* Grant Masking Policies on a specific schema */
  grant create masking policy on schema identifier($var_database).identifier($var_schema) to role identifier($var_role);
  grant apply masking policy on account to role identifier($var_role);
commit;

Working with the enforce masking policy

In Data Integration, the Enforce Masking Policy option is available in 3 different locations:

Source to Target Rivers
- Predefined Reports / Multi-Tables Loading Mode
Logic Rivers

Source to Target Rivers

When working with Source to Target Rivers, you get to choose your preferred River Mode. The Enforce Masking Policy appears in two locations: In Predefined Reports or Multi-Tables modes, the option appears in the Table Settings of a specific table:

Any other River mode includes it under Additional Options.

Logic rivers

In Logic Rivers, you can locate it under the Target section.

Enforce masking policy limitations

Ensure not to use quotations while defining the Masking Policy Identifier value. It must also begin with an alphabetic character and cannot contain spaces or any other special characters.
Your account must be configured with the proper permissions in Snowflake and listed under Snowflake Enterprise Edition.
If 'Enforce Masking Policy' is selected and there is no masking policy associated with the Target table, the River does not run.

Define NULL values list toggle

Data Integration ingests source data to Snowflake using a COPY command. By default (if toggle is false), the following NULL_IF clause is used:

NULL_IF=('', 'NULL', 'null', 'NUL', '0000-00-00', '0000-00-00 00:00:00') 

Each value is replaced with NULL. You can modify the list of values (without single quotes) by enabling this toggle.

note

For CDC Rivers, the Null_If function is applied only during the initial migration phase. After the initial migration is completed, data loaded into Snowflake from the CDC stream will no longer be affected by Null_If.
Making changes to existing running Rivers may require performing an initial or full migration to ensure data consistency.

Escape character (optional)

Escape characters enable special characters in strings to be interpreted as literal characters, rather than as control characters. In SQL, escape characters are used to change the behavior of certain characters, such as the single quote ('), the double quote ("), the backslash (\), etc.

Working with escape characters

Data Integration offers the option to use escape characters in Snowflake, making it easier to run complex SQL statements. To activate this feature, navigate to the Additional Options section located at the bottom of the Target's tab, choose the escape character of your choice from the list, and Data Integration encodes and runs the statement in Snowflake.

Snowflake connector for Python

Data Integration makes use of the Snowflake Connector for Python, which enables Python developers to establish a connection to the Snowflake data warehouse and run SQL queries.

AUTOCOMMIT in logic rivers

In Snowflake, the AUTOCOMMIT session parameter dictates whether each SQL statement is committed automatically as a transaction.

When AUTOCOMMIT is TRUE, every executed statement is automatically committed, eliminating the need for explicit transaction commitment. -If AUTOCOMMIT is set to FALSE, you must commit your transactions using the COMMIT statement. Successful statements are committed automatically, whereas failed ones are rolled back.

AUTOCOMMIT is TRUE

In this mode, each SQL statement is treated as a separate transaction, and it is automatically committed upon execution. For example, if you run an INSERT statement, the data is inserted, and the transaction is committed immediately.

AUTOCOMMIT is False

In this mode, you need to start and commit transactions using the BEGIN and COMMIT statements. Changes made by SQL statements are not committed until you issue a COMMIT statement.

note

Setting the toggle to False is not recommended. When set to False, transactions should be fully managed within the Snowflake stored procedure. If the active transaction is initiated and concluded implicitly in the stored procedure, Snowflake rolls back the active transaction and generates an error message.

Using Snowflake stored procedures in Data Integration logic rivers

Data Integration logic rivers allow users to perform advanced data transformations and custom operations by integrating Snowflake stored procedures. This section details the process of configuring, executing, and managing Snowflake stored procedures within Data Integration logic rivers.

Prerequisites

Snowflake Database setup:

Ensure the Snowflake account has the stored procedure created and tested.
Grant appropriate permissions to the Data Integration connection user for executing the stored procedure.

Configuration steps

Create or identify a stored procedure in Snowflake
Add a Snowflake logic river in Data Integration
- Navigate to logic rivers in Data Integration.
- Select Snowflake as the data warehouse option.
Define the procedure execution task
- Click the Source field.
- Select SQL Script.
- Input the CALL statement to execute the stored procedure.
  - Example:

           CALL SAMPLE_PROCEDURE();

Configure parameters (Optional) For stored procedures with input parameters: Example:

call {Snowflake_DB}.etl.update_loadstatus('param_value1', 'param_value2');

Set output handling. Specify how to handle the output.

Important notes

In the RECORD data type, fields are flattened into multiple separate fields, using the parent field(s) as prefixes.

Example:

company RECORD
 - id (string)
 - name (string)

This would result in two distinct columns in Snowflake, named company_id and company_name.

Naming Conflict: Consider the following case:

company RECORD
 - id (string)
 - company_id (string)

Here, both the flattened company_id from the RECORD data type and the standalone company_id column are created, leading to a naming conflict.

Resolution:

If the Source is a REST API, the client has complete control over defining the columns and can adjust the mapping accordingly. This flexibility is ideal for this scenario, particularly when columns may have different structures.
If the source is a fixed-schema database or other structured source where customization is limited, you can resolve the conflict by renaming the columns. For example, you could append or prepend a unique identifier to the conflicting column names (For example, company_flattened_id and company_id). This ensures that the column names remain distinct while preserving the original data structure.

Limitation

By default, tables and schemas are created using uppercase characters. Data Integration does not support lowercase table or schema names, and when you use them, it results in automatic conversion to uppercase.

Troubleshooting snowflake timeout errors

If you encounter timeout issues in Snowflake, follow these steps to resolve the problem efficiently.

1. Analyze query history in Snowflake

Log in to the Snowflake console.
Navigate to the History section in the main menu.
Search for queries executed during the timeframe of the delay.
Find:
Long-running queries.
Queries stuck in Queued or Blocked status.

2. Evaluate warehouse performance and scaling

Go to the Warehouses section in Snowflake.
Locate the warehouse used by the River.
Check for performance issues, such as:
Overloaded resources.
High concurrency or insufficient size.
If necessary:
Adjust the warehouse size.
Enable auto-scaling to handle variable workloads.

3. Examine error and event Logs

Navigate to Account > Activity > Events in Snowflake.
Review error and event logs for the River’s execution period.
Look for any warnings or errors that might indicate the root cause of the timeout.